在本文中,我们解决了单眼散景合成的问题,我们试图从单个全焦点图像中呈现浅深度图像。与DSLR摄像机不同,由于移动光圈的物理限制,这种效果无法直接在移动摄像机中捕获。因此,我们提出了一种基于网络的方法,该方法能够从单个图像输入中渲染现实的单眼散景。为此,我们根据预测的单眼深度图引入了三个新的边缘感知散景损失,该图在模糊背景时锐化了前景边缘。然后,使用对抗性损失对该模型进行固定,从而产生逼真的玻璃效果。实验结果表明,我们的方法能够在处理复杂场景的同时产生令人愉悦的自然散景效果,并具有锋利的边缘。
translated by 谷歌翻译
间接飞行时间(ITOF)相机是一个有希望的深度传感技术。然而,它们容易出现由多路径干扰(MPI)和低信噪比(SNR)引起的错误。传统方法,在去噪后,通过估计编码深度的瞬态图像来减轻MPI。最近,在不使用中间瞬态表示的情况下,共同去噪和减轻MPI的数据驱动方法已经成为最先进的。在本文中,我们建议重新审视瞬态代表。使用数据驱动的Priors,我们将其插入/推断ITOF频率并使用它们来估计瞬态图像。给定直接TOF(DTOF)传感器捕获瞬态图像,我们将我们的方法命名为ITOF2DTOF。瞬态表示是灵活的。它可以集成与基于规则的深度感测算法,对低SNR具有强大,并且可以处理实际上出现的模糊场景(例如,镜面MPI,光学串扰)。我们在真正深度传感方案中展示了先前方法上的ITOF2DTOF的好处。
translated by 谷歌翻译
尽管人类可以通过利用对内容的高级理解的传统或最新学习的图像压缩编解码器来毫不费力地将复杂的视觉场景转变为简单的单词,而另一种方式似乎并没有利用视觉内容的语义含义。潜在的。此外,它们主要集中在率延伸上,并且在感知质量上的表现不佳,尤其是在低比特率方案中,并且常常无视下游计算机视觉算法的性能,这是一个快速增长的压缩图像的快速消费者组。在本文中,我们(1)提出了一个通用框架,该框架可以使任何图像编解码器能够利用高级语义,(2)研究感知质量和失真的关节优化。我们的想法是,鉴于任何编解码器,我们利用高级语义来增强其提取的低级视觉特征,并产生基本上的新的语义意识编解码器。我们提出了一个三相训练方案,该方案教授语义意识的编解码器来利用语义的力量来共同优化速率感知渗透率(R-PD)的性能。作为另一个好处,语义感知的编解码器还提高了下游计算机视觉算法的性能。为了验证我们的主张,我们进行了广泛的经验评估,并提供定量和定性结果。
translated by 谷歌翻译
最近,与常规像素的隐性表示相比,视频的图像隐式神经表示,其有希望的结果和迅速的速度因其有希望的结果和迅速的速度而受欢迎。但是,网络结构内的冗余参数在扩大理想性能时会导致大型模型大小。这种现象的关键原因是神经的耦合公式,该公式直接从框架索引输入中输出视频帧的空间和时间信息。在本文中,我们提出了E-NERV,它通过将图像的隐式神经代表分解为单独的空间和时间上下文来显着加快神经的速度。在这种新公式的指导下,我们的模型大大降低了冗余模型参数,同时保留表示能力。我们从实验上发现,我们的方法可以通过更少的参数改善性能,从而使收敛的速度更快地提高了$ 8 \ times $。代码可在https://github.com/kyleleey/e-nerv上找到。
translated by 谷歌翻译
最近,自我关注操作员将卓越的性能作为视觉模型的独立构建块。然而,现有的自我关注模型通常是手动设计的,从CNN修改,并仅通过堆叠一个操作员而获得。很少探索相结合不同的自我关注操作员和卷积的更广泛的建筑空间。在本文中,我们探讨了具有权重共享神经结构搜索(NAS)算法的新颖建筑空间。结果架构被命名为Triomet,用于组合卷积,局部自我关注和全球(轴向)自我关注操作员。为了有效地搜索在这个巨大的建筑空间中,我们提出了分层采样,以便更好地培训超空网。此外,我们提出了一种新的重量分享策略,多头分享,专门针对多头自我关注运营商。我们搜索的Tri of将自我关注和卷积相结合优于所有独立的模型,在想象网分类上具有较少的拖鞋,自我关注比卷积更好。此外,在各种小型数据集上,我们观察对自我关注模型的劣等性能,但我们的小脚仍然能够匹配这种情况下的最佳操作员,卷积。我们的代码可在https://github.com/phj128/trionet提供。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译